{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 电影数据分析\n",
    "\n",
    "\n",
    "### 准备工作\n",
    "\n",
    "从网站 grouplens.org/datasets/movielens 下载 [MovieLens 1M Dataset](http://files.grouplens.org/datasets/movielens/ml-1m.zip) 数据。\n",
    "\n",
    "### 数据说明\n",
    "\n",
    "参阅数据介绍文件 [README.txt](http://files.grouplens.org/datasets/movielens/ml-1m-README.txt)\n",
    "\n",
    "### 利用 Pandas 分析电影评分数据\n",
    "\n",
    "* 数据读取\n",
    "* 数据合并\n",
    "* 统计电影平均得分\n",
    "* 统计活跃电影 -> 获得评分的次数越多说明电影越活跃\n",
    "* 女生最喜欢的电影排行榜\n",
    "* 男生最喜欢的电影排行榜\n",
    "* 男女生评分差距最大的电影 -> 某类电影女生喜欢，但男生不喜欢\n",
    "* 最具争议的电影排行榜 -> 评分的方差最大\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据读取"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "user_names = ['user_id', 'gender', 'age', 'occupation', 'zip']\n",
    "users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=user_names, engine='python')\n",
    "\n",
    "rating_names = ['user_id', 'movie_id', 'rating', 'timestamp']\n",
    "ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rating_names, engine='python')\n",
    "\n",
    "movie_names = ['movie_id', 'title', 'genres']\n",
    "movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=movie_names, engine='python')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6040\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>gender</th>\n",
       "      <th>age</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>M</td>\n",
       "      <td>56</td>\n",
       "      <td>16</td>\n",
       "      <td>70072</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>M</td>\n",
       "      <td>25</td>\n",
       "      <td>15</td>\n",
       "      <td>55117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>M</td>\n",
       "      <td>45</td>\n",
       "      <td>7</td>\n",
       "      <td>02460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>M</td>\n",
       "      <td>25</td>\n",
       "      <td>20</td>\n",
       "      <td>55455</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id gender  age  occupation    zip\n",
       "0        1      F    1          10  48067\n",
       "1        2      M   56          16  70072\n",
       "2        3      M   25          15  55117\n",
       "3        4      M   45           7  02460\n",
       "4        5      M   25          20  55455"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(users)\n",
    "users.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1000209\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>movie_id</th>\n",
       "      <th>rating</th>\n",
       "      <th>timestamp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1193</td>\n",
       "      <td>5</td>\n",
       "      <td>978300760</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>661</td>\n",
       "      <td>3</td>\n",
       "      <td>978302109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>914</td>\n",
       "      <td>3</td>\n",
       "      <td>978301968</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3408</td>\n",
       "      <td>4</td>\n",
       "      <td>978300275</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>2355</td>\n",
       "      <td>5</td>\n",
       "      <td>978824291</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id  movie_id  rating  timestamp\n",
       "0        1      1193       5  978300760\n",
       "1        1       661       3  978302109\n",
       "2        1       914       3  978301968\n",
       "3        1      3408       4  978300275\n",
       "4        1      2355       5  978824291"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(ratings)\n",
    "ratings.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3883\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>movie_id</th>\n",
       "      <th>title</th>\n",
       "      <th>genres</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Toy Story (1995)</td>\n",
       "      <td>Animation|Children's|Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Jumanji (1995)</td>\n",
       "      <td>Adventure|Children's|Fantasy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Grumpier Old Men (1995)</td>\n",
       "      <td>Comedy|Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Waiting to Exhale (1995)</td>\n",
       "      <td>Comedy|Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Father of the Bride Part II (1995)</td>\n",
       "      <td>Comedy</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   movie_id                               title                        genres\n",
       "0         1                    Toy Story (1995)   Animation|Children's|Comedy\n",
       "1         2                      Jumanji (1995)  Adventure|Children's|Fantasy\n",
       "2         3             Grumpier Old Men (1995)                Comedy|Romance\n",
       "3         4            Waiting to Exhale (1995)                  Comedy|Drama\n",
       "4         5  Father of the Bride Part II (1995)                        Comedy"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(movies)\n",
    "movies.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据合并"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = pd.merge(pd.merge(users, ratings), movies)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>gender</th>\n",
       "      <th>age</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip</th>\n",
       "      <th>movie_id</th>\n",
       "      <th>rating</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>title</th>\n",
       "      <th>genres</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1193</td>\n",
       "      <td>5</td>\n",
       "      <td>978300760</td>\n",
       "      <td>One Flew Over the Cuckoo's Nest (1975)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>M</td>\n",
       "      <td>56</td>\n",
       "      <td>16</td>\n",
       "      <td>70072</td>\n",
       "      <td>1193</td>\n",
       "      <td>5</td>\n",
       "      <td>978298413</td>\n",
       "      <td>One Flew Over the Cuckoo's Nest (1975)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12</td>\n",
       "      <td>M</td>\n",
       "      <td>25</td>\n",
       "      <td>12</td>\n",
       "      <td>32793</td>\n",
       "      <td>1193</td>\n",
       "      <td>4</td>\n",
       "      <td>978220179</td>\n",
       "      <td>One Flew Over the Cuckoo's Nest (1975)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>15</td>\n",
       "      <td>M</td>\n",
       "      <td>25</td>\n",
       "      <td>7</td>\n",
       "      <td>22903</td>\n",
       "      <td>1193</td>\n",
       "      <td>4</td>\n",
       "      <td>978199279</td>\n",
       "      <td>One Flew Over the Cuckoo's Nest (1975)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17</td>\n",
       "      <td>M</td>\n",
       "      <td>50</td>\n",
       "      <td>1</td>\n",
       "      <td>95350</td>\n",
       "      <td>1193</td>\n",
       "      <td>5</td>\n",
       "      <td>978158471</td>\n",
       "      <td>One Flew Over the Cuckoo's Nest (1975)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id gender  age  occupation    zip  movie_id  rating  timestamp  \\\n",
       "0        1      F    1          10  48067      1193       5  978300760   \n",
       "1        2      M   56          16  70072      1193       5  978298413   \n",
       "2       12      M   25          12  32793      1193       4  978220179   \n",
       "3       15      M   25           7  22903      1193       4  978199279   \n",
       "4       17      M   50           1  95350      1193       5  978158471   \n",
       "\n",
       "                                    title genres  \n",
       "0  One Flew Over the Cuckoo's Nest (1975)  Drama  \n",
       "1  One Flew Over the Cuckoo's Nest (1975)  Drama  \n",
       "2  One Flew Over the Cuckoo's Nest (1975)  Drama  \n",
       "3  One Flew Over the Cuckoo's Nest (1975)  Drama  \n",
       "4  One Flew Over the Cuckoo's Nest (1975)  Drama  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(data)\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>gender</th>\n",
       "      <th>age</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip</th>\n",
       "      <th>movie_id</th>\n",
       "      <th>rating</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>title</th>\n",
       "      <th>genres</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1193</td>\n",
       "      <td>5</td>\n",
       "      <td>978300760</td>\n",
       "      <td>One Flew Over the Cuckoo's Nest (1975)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1725</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>661</td>\n",
       "      <td>3</td>\n",
       "      <td>978302109</td>\n",
       "      <td>James and the Giant Peach (1996)</td>\n",
       "      <td>Animation|Children's|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2250</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>914</td>\n",
       "      <td>3</td>\n",
       "      <td>978301968</td>\n",
       "      <td>My Fair Lady (1964)</td>\n",
       "      <td>Musical|Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2886</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>3408</td>\n",
       "      <td>4</td>\n",
       "      <td>978300275</td>\n",
       "      <td>Erin Brockovich (2000)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4201</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2355</td>\n",
       "      <td>5</td>\n",
       "      <td>978824291</td>\n",
       "      <td>Bug's Life, A (1998)</td>\n",
       "      <td>Animation|Children's|Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5904</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1197</td>\n",
       "      <td>3</td>\n",
       "      <td>978302268</td>\n",
       "      <td>Princess Bride, The (1987)</td>\n",
       "      <td>Action|Adventure|Comedy|Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8222</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1287</td>\n",
       "      <td>5</td>\n",
       "      <td>978302039</td>\n",
       "      <td>Ben-Hur (1959)</td>\n",
       "      <td>Action|Adventure|Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8926</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2804</td>\n",
       "      <td>5</td>\n",
       "      <td>978300719</td>\n",
       "      <td>Christmas Story, A (1983)</td>\n",
       "      <td>Comedy|Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10278</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>594</td>\n",
       "      <td>4</td>\n",
       "      <td>978302268</td>\n",
       "      <td>Snow White and the Seven Dwarfs (1937)</td>\n",
       "      <td>Animation|Children's|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11041</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>919</td>\n",
       "      <td>4</td>\n",
       "      <td>978301368</td>\n",
       "      <td>Wizard of Oz, The (1939)</td>\n",
       "      <td>Adventure|Children's|Drama|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12759</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>595</td>\n",
       "      <td>5</td>\n",
       "      <td>978824268</td>\n",
       "      <td>Beauty and the Beast (1991)</td>\n",
       "      <td>Animation|Children's|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13819</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>938</td>\n",
       "      <td>4</td>\n",
       "      <td>978301752</td>\n",
       "      <td>Gigi (1958)</td>\n",
       "      <td>Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14006</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2398</td>\n",
       "      <td>4</td>\n",
       "      <td>978302281</td>\n",
       "      <td>Miracle on 34th Street (1947)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14386</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2918</td>\n",
       "      <td>4</td>\n",
       "      <td>978302124</td>\n",
       "      <td>Ferris Bueller's Day Off (1986)</td>\n",
       "      <td>Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15859</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1035</td>\n",
       "      <td>5</td>\n",
       "      <td>978301753</td>\n",
       "      <td>Sound of Music, The (1965)</td>\n",
       "      <td>Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16741</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2791</td>\n",
       "      <td>4</td>\n",
       "      <td>978302188</td>\n",
       "      <td>Airplane! (1980)</td>\n",
       "      <td>Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18472</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2687</td>\n",
       "      <td>3</td>\n",
       "      <td>978824268</td>\n",
       "      <td>Tarzan (1999)</td>\n",
       "      <td>Animation|Children's</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18914</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2018</td>\n",
       "      <td>4</td>\n",
       "      <td>978301777</td>\n",
       "      <td>Bambi (1942)</td>\n",
       "      <td>Animation|Children's</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19503</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>3105</td>\n",
       "      <td>5</td>\n",
       "      <td>978301713</td>\n",
       "      <td>Awakenings (1990)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20183</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2797</td>\n",
       "      <td>4</td>\n",
       "      <td>978302039</td>\n",
       "      <td>Big (1988)</td>\n",
       "      <td>Comedy|Fantasy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21674</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2321</td>\n",
       "      <td>3</td>\n",
       "      <td>978302205</td>\n",
       "      <td>Pleasantville (1998)</td>\n",
       "      <td>Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22832</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>720</td>\n",
       "      <td>3</td>\n",
       "      <td>978300760</td>\n",
       "      <td>Wallace &amp; Gromit: The Best of Aardman Animatio...</td>\n",
       "      <td>Animation</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23270</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1270</td>\n",
       "      <td>5</td>\n",
       "      <td>978300055</td>\n",
       "      <td>Back to the Future (1985)</td>\n",
       "      <td>Comedy|Sci-Fi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25853</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>527</td>\n",
       "      <td>5</td>\n",
       "      <td>978824195</td>\n",
       "      <td>Schindler's List (1993)</td>\n",
       "      <td>Drama|War</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28157</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2340</td>\n",
       "      <td>3</td>\n",
       "      <td>978300103</td>\n",
       "      <td>Meet Joe Black (1998)</td>\n",
       "      <td>Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28501</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>48</td>\n",
       "      <td>5</td>\n",
       "      <td>978824351</td>\n",
       "      <td>Pocahontas (1995)</td>\n",
       "      <td>Animation|Children's|Musical|Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28883</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1097</td>\n",
       "      <td>4</td>\n",
       "      <td>978301953</td>\n",
       "      <td>E.T. the Extra-Terrestrial (1982)</td>\n",
       "      <td>Children's|Drama|Fantasy|Sci-Fi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31152</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1721</td>\n",
       "      <td>4</td>\n",
       "      <td>978300055</td>\n",
       "      <td>Titanic (1997)</td>\n",
       "      <td>Drama|Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32698</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1545</td>\n",
       "      <td>4</td>\n",
       "      <td>978824139</td>\n",
       "      <td>Ponette (1996)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32771</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>745</td>\n",
       "      <td>3</td>\n",
       "      <td>978824268</td>\n",
       "      <td>Close Shave, A (1995)</td>\n",
       "      <td>Animation|Comedy|Thriller</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33428</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2294</td>\n",
       "      <td>4</td>\n",
       "      <td>978824291</td>\n",
       "      <td>Antz (1998)</td>\n",
       "      <td>Animation|Children's</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34073</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>3186</td>\n",
       "      <td>4</td>\n",
       "      <td>978300019</td>\n",
       "      <td>Girl, Interrupted (1999)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34504</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1566</td>\n",
       "      <td>4</td>\n",
       "      <td>978824330</td>\n",
       "      <td>Hercules (1997)</td>\n",
       "      <td>Adventure|Animation|Children's|Comedy|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34973</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>588</td>\n",
       "      <td>4</td>\n",
       "      <td>978824268</td>\n",
       "      <td>Aladdin (1992)</td>\n",
       "      <td>Animation|Children's|Comedy|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36324</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1907</td>\n",
       "      <td>4</td>\n",
       "      <td>978824330</td>\n",
       "      <td>Mulan (1998)</td>\n",
       "      <td>Animation|Children's</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36814</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>783</td>\n",
       "      <td>4</td>\n",
       "      <td>978824291</td>\n",
       "      <td>Hunchback of Notre Dame, The (1996)</td>\n",
       "      <td>Animation|Children's|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37204</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1836</td>\n",
       "      <td>5</td>\n",
       "      <td>978300172</td>\n",
       "      <td>Last Days of Disco, The (1998)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37339</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1022</td>\n",
       "      <td>5</td>\n",
       "      <td>978300055</td>\n",
       "      <td>Cinderella (1950)</td>\n",
       "      <td>Animation|Children's|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37916</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2762</td>\n",
       "      <td>4</td>\n",
       "      <td>978302091</td>\n",
       "      <td>Sixth Sense, The (1999)</td>\n",
       "      <td>Thriller</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40375</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>150</td>\n",
       "      <td>5</td>\n",
       "      <td>978301777</td>\n",
       "      <td>Apollo 13 (1995)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41626</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>978824268</td>\n",
       "      <td>Toy Story (1995)</td>\n",
       "      <td>Animation|Children's|Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43703</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1961</td>\n",
       "      <td>5</td>\n",
       "      <td>978301590</td>\n",
       "      <td>Rain Man (1988)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45033</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1962</td>\n",
       "      <td>4</td>\n",
       "      <td>978301753</td>\n",
       "      <td>Driving Miss Daisy (1989)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45685</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2692</td>\n",
       "      <td>4</td>\n",
       "      <td>978301570</td>\n",
       "      <td>Run Lola Run (Lola rennt) (1998)</td>\n",
       "      <td>Action|Crime|Romance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46757</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>260</td>\n",
       "      <td>4</td>\n",
       "      <td>978300760</td>\n",
       "      <td>Star Wars: Episode IV - A New Hope (1977)</td>\n",
       "      <td>Action|Adventure|Fantasy|Sci-Fi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49748</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1028</td>\n",
       "      <td>5</td>\n",
       "      <td>978301777</td>\n",
       "      <td>Mary Poppins (1964)</td>\n",
       "      <td>Children's|Comedy|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50759</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1029</td>\n",
       "      <td>5</td>\n",
       "      <td>978302205</td>\n",
       "      <td>Dumbo (1941)</td>\n",
       "      <td>Animation|Children's|Musical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51327</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1207</td>\n",
       "      <td>4</td>\n",
       "      <td>978300719</td>\n",
       "      <td>To Kill a Mockingbird (1962)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52255</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>2028</td>\n",
       "      <td>5</td>\n",
       "      <td>978301619</td>\n",
       "      <td>Saving Private Ryan (1998)</td>\n",
       "      <td>Action|Drama|War</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54908</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>531</td>\n",
       "      <td>4</td>\n",
       "      <td>978302149</td>\n",
       "      <td>Secret Garden, The (1993)</td>\n",
       "      <td>Children's|Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55246</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>3114</td>\n",
       "      <td>4</td>\n",
       "      <td>978302174</td>\n",
       "      <td>Toy Story 2 (1999)</td>\n",
       "      <td>Animation|Children's|Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56831</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>608</td>\n",
       "      <td>4</td>\n",
       "      <td>978301398</td>\n",
       "      <td>Fargo (1996)</td>\n",
       "      <td>Crime|Drama|Thriller</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59344</th>\n",
       "      <td>1</td>\n",
       "      <td>F</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>48067</td>\n",
       "      <td>1246</td>\n",
       "      <td>4</td>\n",
       "      <td>978302091</td>\n",
       "      <td>Dead Poets Society (1989)</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       user_id gender  age  occupation    zip  movie_id  rating  timestamp  \\\n",
       "0            1      F    1          10  48067      1193       5  978300760   \n",
       "1725         1      F    1          10  48067       661       3  978302109   \n",
       "2250         1      F    1          10  48067       914       3  978301968   \n",
       "2886         1      F    1          10  48067      3408       4  978300275   \n",
       "4201         1      F    1          10  48067      2355       5  978824291   \n",
       "5904         1      F    1          10  48067      1197       3  978302268   \n",
       "8222         1      F    1          10  48067      1287       5  978302039   \n",
       "8926         1      F    1          10  48067      2804       5  978300719   \n",
       "10278        1      F    1          10  48067       594       4  978302268   \n",
       "11041        1      F    1          10  48067       919       4  978301368   \n",
       "12759        1      F    1          10  48067       595       5  978824268   \n",
       "13819        1      F    1          10  48067       938       4  978301752   \n",
       "14006        1      F    1          10  48067      2398       4  978302281   \n",
       "14386        1      F    1          10  48067      2918       4  978302124   \n",
       "15859        1      F    1          10  48067      1035       5  978301753   \n",
       "16741        1      F    1          10  48067      2791       4  978302188   \n",
       "18472        1      F    1          10  48067      2687       3  978824268   \n",
       "18914        1      F    1          10  48067      2018       4  978301777   \n",
       "19503        1      F    1          10  48067      3105       5  978301713   \n",
       "20183        1      F    1          10  48067      2797       4  978302039   \n",
       "21674        1      F    1          10  48067      2321       3  978302205   \n",
       "22832        1      F    1          10  48067       720       3  978300760   \n",
       "23270        1      F    1          10  48067      1270       5  978300055   \n",
       "25853        1      F    1          10  48067       527       5  978824195   \n",
       "28157        1      F    1          10  48067      2340       3  978300103   \n",
       "28501        1      F    1          10  48067        48       5  978824351   \n",
       "28883        1      F    1          10  48067      1097       4  978301953   \n",
       "31152        1      F    1          10  48067      1721       4  978300055   \n",
       "32698        1      F    1          10  48067      1545       4  978824139   \n",
       "32771        1      F    1          10  48067       745       3  978824268   \n",
       "33428        1      F    1          10  48067      2294       4  978824291   \n",
       "34073        1      F    1          10  48067      3186       4  978300019   \n",
       "34504        1      F    1          10  48067      1566       4  978824330   \n",
       "34973        1      F    1          10  48067       588       4  978824268   \n",
       "36324        1      F    1          10  48067      1907       4  978824330   \n",
       "36814        1      F    1          10  48067       783       4  978824291   \n",
       "37204        1      F    1          10  48067      1836       5  978300172   \n",
       "37339        1      F    1          10  48067      1022       5  978300055   \n",
       "37916        1      F    1          10  48067      2762       4  978302091   \n",
       "40375        1      F    1          10  48067       150       5  978301777   \n",
       "41626        1      F    1          10  48067         1       5  978824268   \n",
       "43703        1      F    1          10  48067      1961       5  978301590   \n",
       "45033        1      F    1          10  48067      1962       4  978301753   \n",
       "45685        1      F    1          10  48067      2692       4  978301570   \n",
       "46757        1      F    1          10  48067       260       4  978300760   \n",
       "49748        1      F    1          10  48067      1028       5  978301777   \n",
       "50759        1      F    1          10  48067      1029       5  978302205   \n",
       "51327        1      F    1          10  48067      1207       4  978300719   \n",
       "52255        1      F    1          10  48067      2028       5  978301619   \n",
       "54908        1      F    1          10  48067       531       4  978302149   \n",
       "55246        1      F    1          10  48067      3114       4  978302174   \n",
       "56831        1      F    1          10  48067       608       4  978301398   \n",
       "59344        1      F    1          10  48067      1246       4  978302091   \n",
       "\n",
       "                                                   title  \\\n",
       "0                 One Flew Over the Cuckoo's Nest (1975)   \n",
       "1725                    James and the Giant Peach (1996)   \n",
       "2250                                 My Fair Lady (1964)   \n",
       "2886                              Erin Brockovich (2000)   \n",
       "4201                                Bug's Life, A (1998)   \n",
       "5904                          Princess Bride, The (1987)   \n",
       "8222                                      Ben-Hur (1959)   \n",
       "8926                           Christmas Story, A (1983)   \n",
       "10278             Snow White and the Seven Dwarfs (1937)   \n",
       "11041                           Wizard of Oz, The (1939)   \n",
       "12759                        Beauty and the Beast (1991)   \n",
       "13819                                        Gigi (1958)   \n",
       "14006                      Miracle on 34th Street (1947)   \n",
       "14386                    Ferris Bueller's Day Off (1986)   \n",
       "15859                         Sound of Music, The (1965)   \n",
       "16741                                   Airplane! (1980)   \n",
       "18472                                      Tarzan (1999)   \n",
       "18914                                       Bambi (1942)   \n",
       "19503                                  Awakenings (1990)   \n",
       "20183                                         Big (1988)   \n",
       "21674                               Pleasantville (1998)   \n",
       "22832  Wallace & Gromit: The Best of Aardman Animatio...   \n",
       "23270                          Back to the Future (1985)   \n",
       "25853                            Schindler's List (1993)   \n",
       "28157                              Meet Joe Black (1998)   \n",
       "28501                                  Pocahontas (1995)   \n",
       "28883                  E.T. the Extra-Terrestrial (1982)   \n",
       "31152                                     Titanic (1997)   \n",
       "32698                                     Ponette (1996)   \n",
       "32771                              Close Shave, A (1995)   \n",
       "33428                                        Antz (1998)   \n",
       "34073                           Girl, Interrupted (1999)   \n",
       "34504                                    Hercules (1997)   \n",
       "34973                                     Aladdin (1992)   \n",
       "36324                                       Mulan (1998)   \n",
       "36814                Hunchback of Notre Dame, The (1996)   \n",
       "37204                     Last Days of Disco, The (1998)   \n",
       "37339                                  Cinderella (1950)   \n",
       "37916                            Sixth Sense, The (1999)   \n",
       "40375                                   Apollo 13 (1995)   \n",
       "41626                                   Toy Story (1995)   \n",
       "43703                                    Rain Man (1988)   \n",
       "45033                          Driving Miss Daisy (1989)   \n",
       "45685                   Run Lola Run (Lola rennt) (1998)   \n",
       "46757          Star Wars: Episode IV - A New Hope (1977)   \n",
       "49748                                Mary Poppins (1964)   \n",
       "50759                                       Dumbo (1941)   \n",
       "51327                       To Kill a Mockingbird (1962)   \n",
       "52255                         Saving Private Ryan (1998)   \n",
       "54908                          Secret Garden, The (1993)   \n",
       "55246                                 Toy Story 2 (1999)   \n",
       "56831                                       Fargo (1996)   \n",
       "59344                          Dead Poets Society (1989)   \n",
       "\n",
       "                                              genres  \n",
       "0                                              Drama  \n",
       "1725                    Animation|Children's|Musical  \n",
       "2250                                 Musical|Romance  \n",
       "2886                                           Drama  \n",
       "4201                     Animation|Children's|Comedy  \n",
       "5904                 Action|Adventure|Comedy|Romance  \n",
       "8222                          Action|Adventure|Drama  \n",
       "8926                                    Comedy|Drama  \n",
       "10278                   Animation|Children's|Musical  \n",
       "11041             Adventure|Children's|Drama|Musical  \n",
       "12759                   Animation|Children's|Musical  \n",
       "13819                                        Musical  \n",
       "14006                                          Drama  \n",
       "14386                                         Comedy  \n",
       "15859                                        Musical  \n",
       "16741                                         Comedy  \n",
       "18472                           Animation|Children's  \n",
       "18914                           Animation|Children's  \n",
       "19503                                          Drama  \n",
       "20183                                 Comedy|Fantasy  \n",
       "21674                                         Comedy  \n",
       "22832                                      Animation  \n",
       "23270                                  Comedy|Sci-Fi  \n",
       "25853                                      Drama|War  \n",
       "28157                                        Romance  \n",
       "28501           Animation|Children's|Musical|Romance  \n",
       "28883                Children's|Drama|Fantasy|Sci-Fi  \n",
       "31152                                  Drama|Romance  \n",
       "32698                                          Drama  \n",
       "32771                      Animation|Comedy|Thriller  \n",
       "33428                           Animation|Children's  \n",
       "34073                                          Drama  \n",
       "34504  Adventure|Animation|Children's|Comedy|Musical  \n",
       "34973            Animation|Children's|Comedy|Musical  \n",
       "36324                           Animation|Children's  \n",
       "36814                   Animation|Children's|Musical  \n",
       "37204                                          Drama  \n",
       "37339                   Animation|Children's|Musical  \n",
       "37916                                       Thriller  \n",
       "40375                                          Drama  \n",
       "41626                    Animation|Children's|Comedy  \n",
       "43703                                          Drama  \n",
       "45033                                          Drama  \n",
       "45685                           Action|Crime|Romance  \n",
       "46757                Action|Adventure|Fantasy|Sci-Fi  \n",
       "49748                      Children's|Comedy|Musical  \n",
       "50759                   Animation|Children's|Musical  \n",
       "51327                                          Drama  \n",
       "52255                               Action|Drama|War  \n",
       "54908                               Children's|Drama  \n",
       "55246                    Animation|Children's|Comedy  \n",
       "56831                           Crime|Drama|Thriller  \n",
       "59344                                          Drama  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data.user_id == 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>$1,000,000 Duck (1971)</th>\n",
       "      <td>3.375000</td>\n",
       "      <td>2.761905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>'Night Mother (1986)</th>\n",
       "      <td>3.388889</td>\n",
       "      <td>3.352941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>'Til There Was You (1997)</th>\n",
       "      <td>2.675676</td>\n",
       "      <td>2.733333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>'burbs, The (1989)</th>\n",
       "      <td>2.793478</td>\n",
       "      <td>2.962085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...And Justice for All (1979)</th>\n",
       "      <td>3.828571</td>\n",
       "      <td>3.689024</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender                                F         M\n",
       "title                                            \n",
       "$1,000,000 Duck (1971)         3.375000  2.761905\n",
       "'Night Mother (1986)           3.388889  3.352941\n",
       "'Til There Was You (1997)      2.675676  2.733333\n",
       "'burbs, The (1989)             2.793478  2.962085\n",
       "...And Justice for All (1979)  3.828571  3.689024"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 按性别查看各个电影的平均评分\n",
    "mean_ratings_gender = data.pivot_table(values='rating', index='title', columns='gender', aggfunc='mean')\n",
    "mean_ratings_gender.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "      <th>diff</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>$1,000,000 Duck (1971)</th>\n",
       "      <td>3.375000</td>\n",
       "      <td>2.761905</td>\n",
       "      <td>0.613095</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>'Night Mother (1986)</th>\n",
       "      <td>3.388889</td>\n",
       "      <td>3.352941</td>\n",
       "      <td>0.035948</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>'Til There Was You (1997)</th>\n",
       "      <td>2.675676</td>\n",
       "      <td>2.733333</td>\n",
       "      <td>-0.057658</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>'burbs, The (1989)</th>\n",
       "      <td>2.793478</td>\n",
       "      <td>2.962085</td>\n",
       "      <td>-0.168607</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...And Justice for All (1979)</th>\n",
       "      <td>3.828571</td>\n",
       "      <td>3.689024</td>\n",
       "      <td>0.139547</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender                                F         M      diff\n",
       "title                                                      \n",
       "$1,000,000 Duck (1971)         3.375000  2.761905  0.613095\n",
       "'Night Mother (1986)           3.388889  3.352941  0.035948\n",
       "'Til There Was You (1997)      2.675676  2.733333 -0.057658\n",
       "'burbs, The (1989)             2.793478  2.962085 -0.168607\n",
       "...And Justice for All (1979)  3.828571  3.689024  0.139547"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 男女意见想差最大的电影 -> 价值观/品味冲突\n",
    "mean_ratings_gender['diff'] = mean_ratings_gender.F - mean_ratings_gender.M\n",
    "mean_ratings_gender.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "      <th>diff</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tigrero: A Film That Was Never Made (1994)</th>\n",
       "      <td>1</td>\n",
       "      <td>4.333333</td>\n",
       "      <td>-3.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Neon Bible, The (1995)</th>\n",
       "      <td>1</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>-3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Enfer, L' (1994)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.750000</td>\n",
       "      <td>-2.750000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Stalingrad (1993)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.593750</td>\n",
       "      <td>-2.593750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Killer: A Journal of Murder (1995)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.428571</td>\n",
       "      <td>-2.428571</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Dangerous Ground (1997)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.333333</td>\n",
       "      <td>-2.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>In God's Hands (1998)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.333333</td>\n",
       "      <td>-2.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rosie (1998)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.333333</td>\n",
       "      <td>-2.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Flying Saucer, The (1950)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.300000</td>\n",
       "      <td>-2.300000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jamaica Inn (1939)</th>\n",
       "      <td>1</td>\n",
       "      <td>3.142857</td>\n",
       "      <td>-2.142857</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender                                      F         M      diff\n",
       "title                                                            \n",
       "Tigrero: A Film That Was Never Made (1994)  1  4.333333 -3.333333\n",
       "Neon Bible, The (1995)                      1  4.000000 -3.000000\n",
       "Enfer, L' (1994)                            1  3.750000 -2.750000\n",
       "Stalingrad (1993)                           1  3.593750 -2.593750\n",
       "Killer: A Journal of Murder (1995)          1  3.428571 -2.428571\n",
       "Dangerous Ground (1997)                     1  3.333333 -2.333333\n",
       "In God's Hands (1998)                       1  3.333333 -2.333333\n",
       "Rosie (1998)                                1  3.333333 -2.333333\n",
       "Flying Saucer, The (1950)                   1  3.300000 -2.300000\n",
       "Jamaica Inn (1939)                          1  3.142857 -2.142857"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_ratings_gender.sort_values(by='diff', ascending=True).head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title\n",
       "$1,000,000 Duck (1971)            37\n",
       "'Night Mother (1986)              70\n",
       "'Til There Was You (1997)         52\n",
       "'burbs, The (1989)               303\n",
       "...And Justice for All (1979)    199\n",
       "dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 活跃电影排行榜\n",
    "ratings_by_movie_title = data.groupby('title').size()\n",
    "ratings_by_movie_title.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title\n",
       "American Beauty (1999)                                   3428\n",
       "Star Wars: Episode IV - A New Hope (1977)                2991\n",
       "Star Wars: Episode V - The Empire Strikes Back (1980)    2990\n",
       "Star Wars: Episode VI - Return of the Jedi (1983)        2883\n",
       "Jurassic Park (1993)                                     2672\n",
       "Saving Private Ryan (1998)                               2653\n",
       "Terminator 2: Judgment Day (1991)                        2649\n",
       "Matrix, The (1999)                                       2590\n",
       "Back to the Future (1985)                                2583\n",
       "Silence of the Lambs, The (1991)                         2578\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 前十大活跃电影 -> 参与评分人数最多的电影\n",
    "top_ratings = ratings_by_movie_title[ratings_by_movie_title > 1000]\n",
    "top_10_ratings = top_ratings.sort_values(ascending=False).head(10)\n",
    "top_10_ratings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title\n",
       "Gate of Heavenly Peace, The (1995)                                     5.000000\n",
       "Lured (1947)                                                           5.000000\n",
       "Ulysses (Ulisse) (1954)                                                5.000000\n",
       "Smashing Time (1967)                                                   5.000000\n",
       "Follow the Bitch (1998)                                                5.000000\n",
       "Song of Freedom (1936)                                                 5.000000\n",
       "Bittersweet Motel (2000)                                               5.000000\n",
       "Baby, The (1973)                                                       5.000000\n",
       "One Little Indian (1973)                                               5.000000\n",
       "Schlafes Bruder (Brother of Sleep) (1995)                              5.000000\n",
       "I Am Cuba (Soy Cuba/Ya Kuba) (1964)                                    4.800000\n",
       "Lamerica (1994)                                                        4.750000\n",
       "Apple, The (Sib) (1998)                                                4.666667\n",
       "Sanjuro (1962)                                                         4.608696\n",
       "Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)    4.560510\n",
       "Shawshank Redemption, The (1994)                                       4.554558\n",
       "Godfather, The (1972)                                                  4.524966\n",
       "Close Shave, A (1995)                                                  4.520548\n",
       "Usual Suspects, The (1995)                                             4.517106\n",
       "Schindler's List (1993)                                                4.510417\n",
       "Name: rating, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 前二十大高分电影 -> 平均评分最高的电影\n",
    "mean_ratings = data.pivot_table(values='rating', index='title', aggfunc='mean')\n",
    "top_20_mean_ratings = mean_ratings.sort_values(ascending=False).head(20)\n",
    "top_20_mean_ratings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title\n",
       "American Beauty (1999)                                   4.317386\n",
       "Star Wars: Episode IV - A New Hope (1977)                4.453694\n",
       "Star Wars: Episode V - The Empire Strikes Back (1980)    4.292977\n",
       "Star Wars: Episode VI - Return of the Jedi (1983)        4.022893\n",
       "Jurassic Park (1993)                                     3.763847\n",
       "Saving Private Ryan (1998)                               4.337354\n",
       "Terminator 2: Judgment Day (1991)                        4.058513\n",
       "Matrix, The (1999)                                       4.315830\n",
       "Back to the Future (1985)                                3.990321\n",
       "Silence of the Lambs, The (1991)                         4.351823\n",
       "Name: rating, dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 前十大热闹电影的平均评分 -> 不一定越热闹的电影，评分越高\n",
    "mean_ratings[top_10_ratings.index]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title\n",
       "Gate of Heavenly Peace, The (1995)                                        3\n",
       "Lured (1947)                                                              1\n",
       "Ulysses (Ulisse) (1954)                                                   1\n",
       "Smashing Time (1967)                                                      2\n",
       "Follow the Bitch (1998)                                                   1\n",
       "Song of Freedom (1936)                                                    1\n",
       "Bittersweet Motel (2000)                                                  1\n",
       "Baby, The (1973)                                                          1\n",
       "One Little Indian (1973)                                                  1\n",
       "Schlafes Bruder (Brother of Sleep) (1995)                                 1\n",
       "I Am Cuba (Soy Cuba/Ya Kuba) (1964)                                       5\n",
       "Lamerica (1994)                                                           8\n",
       "Apple, The (Sib) (1998)                                                   9\n",
       "Sanjuro (1962)                                                           69\n",
       "Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)     628\n",
       "Shawshank Redemption, The (1994)                                       2227\n",
       "Godfather, The (1972)                                                  2223\n",
       "Close Shave, A (1995)                                                   657\n",
       "Usual Suspects, The (1995)                                             1783\n",
       "Schindler's List (1993)                                                2304\n",
       "dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 前二十大高分电影的热闹程度 -> 不一定评分越高的电影越热闹，可能某个很小众的电影看得人少，但评分很高\n",
    "ratings_by_movie_title[top_20_mean_ratings.index]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "title\n",
       "Shawshank Redemption, The (1994)                                               4.554558\n",
       "Godfather, The (1972)                                                          4.524966\n",
       "Usual Suspects, The (1995)                                                     4.517106\n",
       "Schindler's List (1993)                                                        4.510417\n",
       "Raiders of the Lost Ark (1981)                                                 4.477725\n",
       "Rear Window (1954)                                                             4.476190\n",
       "Star Wars: Episode IV - A New Hope (1977)                                      4.453694\n",
       "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)    4.449890\n",
       "Casablanca (1942)                                                              4.412822\n",
       "Sixth Sense, The (1999)                                                        4.406263\n",
       "Name: rating, dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 十大好电影 -> 活跃度超过 1000 的高分电影\n",
    "top_10_movies = mean_ratings[top_ratings.index].sort_values(ascending=False).head(10)\n",
    "top_10_movies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rating</th>\n",
       "      <th>hot</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Shawshank Redemption, The (1994)</th>\n",
       "      <td>4.554558</td>\n",
       "      <td>2227</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Godfather, The (1972)</th>\n",
       "      <td>4.524966</td>\n",
       "      <td>2223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Usual Suspects, The (1995)</th>\n",
       "      <td>4.517106</td>\n",
       "      <td>1783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Schindler's List (1993)</th>\n",
       "      <td>4.510417</td>\n",
       "      <td>2304</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Raiders of the Lost Ark (1981)</th>\n",
       "      <td>4.477725</td>\n",
       "      <td>2514</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rear Window (1954)</th>\n",
       "      <td>4.476190</td>\n",
       "      <td>1050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Star Wars: Episode IV - A New Hope (1977)</th>\n",
       "      <td>4.453694</td>\n",
       "      <td>2991</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)</th>\n",
       "      <td>4.449890</td>\n",
       "      <td>1367</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Casablanca (1942)</th>\n",
       "      <td>4.412822</td>\n",
       "      <td>1669</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sixth Sense, The (1999)</th>\n",
       "      <td>4.406263</td>\n",
       "      <td>2459</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      rating   hot\n",
       "title                                                             \n",
       "Shawshank Redemption, The (1994)                    4.554558  2227\n",
       "Godfather, The (1972)                               4.524966  2223\n",
       "Usual Suspects, The (1995)                          4.517106  1783\n",
       "Schindler's List (1993)                             4.510417  2304\n",
       "Raiders of the Lost Ark (1981)                      4.477725  2514\n",
       "Rear Window (1954)                                  4.476190  1050\n",
       "Star Wars: Episode IV - A New Hope (1977)           4.453694  2991\n",
       "Dr. Strangelove or: How I Learned to Stop Worry...  4.449890  1367\n",
       "Casablanca (1942)                                   4.412822  1669\n",
       "Sixth Sense, The (1999)                             4.406263  2459"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 把平均评分和热度综合起来\n",
    "df_top_10_movies = pd.DataFrame(top_10_movies)\n",
    "df_top_10_movies['hot'] = top_ratings[top_10_movies.index]\n",
    "df_top_10_movies"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
